Idarex: Formal Description of Multi-word Lexemes with Regular Expressions
نویسندگان
چکیده
Most multi-word lexemes (MWLs) allow certain types of variation. This has to be taken into account for their description and their recognition in texts. We suggest to describe their syntactic restrictions and their idiosyncratic peculiarities with local grammar rules, which at the same time express in a general way regularities valid for a whole class of MWLs. The local grammars can be written in a very convenient and compact way as regular expressions in the formalism IDAREX which uses a two-level morphology. IDAREX allows the deenition of various types of variables, and to mix canonical and innected word forms in the regular expressions.
منابع مشابه
Formal Description of Multi-Word Lexemes with the Finite-State Formalism IDAREX
Most multi-word lexemes (MWLs) allow certain types of variation. This has to be taken into account for their description and their recognition in texts. We suggest to describe their syntactic restrictions and their idiosyncratic peculiarities with local grammar rules, which at the same time allow to express in a general way regularities valid for a whole class of MWLs. The local grammars can be...
متن کاملLocal Grammars for the Description of Multi{Word Lexemes and their Automatic Recognition in Texts
Most multi{word lexemes (MWLs) allow certain types of variation. This has to be taken into account for their description to be able to recognize them in texts. We suggest to describe their syntactic restrictions and their idiosyncratic peculiarities with local grammar rules, which at the same time permit to express regularities valid for a whole class of MWLs such as word order variation in Ger...
متن کاملCoCoCo: Online Extraction of Russian Multiword Expressions
In the CoCoCo project we develop methods to extract multi-word expressions of various kinds—idioms, multi-word lexemes, collocations, and colligations—and to evaluate their linguistic stability in a common, uniform fashion. In this paper we introduce a Web interface, which provides the user with access to these measures, to query Russian-language corpora. Potential users of these tools include ...
متن کاملDerivatives for Enhanced Regular Expressions
Regular languages are closed under a wealth of formal language operators. Incorporating such operators in regular expressions leads to concise language specifications, but the transformation of such enhanced regular expressions to finite automata becomes more involved. We present an approach that enables the direct construction of finite automata from regular expressions enhanced with further o...
متن کاملOne-unambiguity of regular expressions with numeric occurrence indicators
Regular expressions with numeric occurrence indicators are an extension of traditional regular expressions, which let the required minimum and the allowed maximum number of iterations of subexpressions be described with numeric parameters. We consider the problem of testing whether a given regular expression E with numeric occurrence indicators is 1-unambiguous or not. This condition means, inf...
متن کامل